A Large-Scale Empirical Analysis of Chinese Web Passwords
نویسندگان
چکیده
Users speaking different languages may prefer different patterns in creating their passwords, and thus knowledge on English passwords cannot help to guess passwords from other languages well. Research has already shown Chinese passwords are one of the most difficult ones to guess. We believe that the conclusion is biased because, to the best of our knowledge, little empirical study has examined regional differences of passwords on a large scale, especially on Chinese passwords. In this paper, we study the differences between passwords from Chinese and English speaking users, leveraging over 100 million leaked and publicly available passwords from Chinese and international websites in recent years. We found that Chinese prefer digits when composing their passwords while English users prefer letters, especially lowercase letters. However, their strength against password guessing is similar. Second, we observe that both users prefer to use the patterns that they are familiar with, e.g., Chinese Pinyins for Chinese and English words for English users. Third, we observe that both Chinese and English users prefer their conventional format when they use dates to construct passwords. Based on these observations, we improve a PCFG (Probabilistic Context-Free Grammar) based password guessing method by inserting Pinyins (about 2.3% more entries) into the attack dictionary and insert our observed composition rules into the guessing rule set. As a result, our experiments show that the efficiency of password guessing increases by 34%.
منابع مشابه
A A Large-Scale Evaluation of High-Impact Password Strength Meters
Passwords are ubiquitous in our daily digital lives. They protect various types of assets ranging from a simple account on an online newspaper website to our health information on government websites. However, due to the inherent value they protect, attackers have developed insights into cracking/guessing passwords both offline and online. In many cases, users are forced to choose stronger pass...
متن کاملThe Scale-free Network of Passwords : Visualization and Estimation of Empirical Passwords
In this paper, we present a novel vision of large scale of empirical password sets available and improve the understanding of passwords by revealing their interconnections and considering the security on a level of the whole password set instead of one single password level. Through the visualization of Yahoo, Phpbb, 12306, etc. we, for the first time, show what the spatial structure of empiric...
متن کاملCharacter encoding issues for web passwords
Password authentication remains ubiquitous on the web, primarily because of its low cost and compatibility with any device which allows a user to input text. Yet text is not universal. Computers must use a character encoding system to convert human-comprehensible writing into bits. We examine for the first time the lingering effects of character encoding on the password ecosystem. We report a n...
متن کاملThe Password Game: Negative Externalities from Weak Password Practices
The combination of username and password is widely used as a human authentication mechanism on the Web. Despite this universal adoption and despite their long tradition, password schemes exhibit a high number of security flaws which jeopardise the confidentiality and integrity of personal information. As Web users tend to reuse the same password for several sites, security negligence at any one...
متن کاملThe Password Thicket: Technical and Market Failures in Human Authentication on the Web
We report the results of the first large-scale empirical analysis of password implementations deployed on the Internet. Our study included 150 websites which offer free user accounts for a variety of purposes, including the most popular destinations on the web and a random sample of e-commerce, news, and communication websites. Although all sites evaluated relied on user-chosen textual password...
متن کامل